feat(stats): Add 1% low FPS tracking#2661
Conversation
|
| Filename | Overview |
|---|---|
| Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp | Core FPS metric refactor: replaces 30-frame static history with a 4096-entry ring buffer; contains a static local timer not migrated to a member variable like the rest of the statics. |
| GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp | Mirror of Generals W3DDisplay changes; same static local timer issue and single-line if style violations. |
| Generals/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h | Adds ring buffer member arrays; has a stale 0.5s comment and spurious extra semicolon in the enum declaration. |
| GeneralsMD/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h | Same as Generals header; same stale comment and double-semicolon typo. |
| Generals/Code/GameEngine/Source/GameClient/InGameUI.cpp | Adds m_renderFpsLowString, INI color config, and draw calls; refactors FPS limit display to show [X] for uncapped. Clean implementation. |
| GeneralsMD/Code/GameEngine/Source/GameClient/InGameUI.cpp | Analogous to Generals InGameUI but uses Unicode arrows and a different default color, diverging from Generals despite the PR claiming identical code. |
| Core/GameEngine/Include/GameClient/Display.h | Adds pure virtual getLow1PercentFPS() to the Display interface; straightforward and correct. |
| Generals/Code/Tools/GUIEdit/Include/GUIEditDisplay.h | Stub getLow1PercentFPS() returning 0 for the GUI editor display, consistent with other stubs. |
Sequence Diagram
sequenceDiagram
participant Draw as W3DDisplay::draw()
participant Metrics as updatePerformanceMetrics()
participant Sample as addFpsSample()
participant Avg as calculateAverageFPS(1.0s)
participant Low as calculateLow1PercentFPS(3.0s)
participant UI as InGameUI::drawRenderFps()
participant Disp as Display interface
Draw->>Metrics: called every frame
Metrics->>Sample: elapsedSeconds
Sample->>Sample: write fpsHistory and durationHistory at offset
Sample-->>Metrics: ring buffer updated
Metrics->>Avg: "windowSeconds=1.0"
Avg-->>Metrics: m_averageFPS
Metrics->>Low: every 100ms via timeGetTime()
Low->>Low: nth_element on sortBuffer bottom 1%
Low-->>Metrics: m_low1PercentFPS
UI->>Disp: getAverageFPS()
Disp-->>UI: m_averageFPS
UI->>Disp: getLow1PercentFPS()
Disp-->>UI: m_low1PercentFPS
UI->>UI: draw e.g. 60 (52) [120]
Prompt To Fix All With AI
Fix the following 6 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 6
Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp:1069-1075
The `lastLowUpdate` timer was left as a `static` local while the rest of the old statics (`lastUpdateTime64`, `historyOffset`, `fpsHistory`) were all migrated to member variables in this very refactor. If `W3DDisplay` is torn down and re-created (display mode switch, windowed/fullscreen toggle, etc.) the static retains its previous value, meaning the new instance's `m_low1PercentFPS` will stay at its constructor-initialized value until the 100 ms interval elapses from that leftover timestamp. Promoting it to a member variable completes the refactor consistently.
```suggestion
UnsignedInt now = timeGetTime();
if (now - m_lastLowUpdate >= 100) // update low 1% metrics at 100ms intervals instead of 1000ms since it is now extremely cheap
{
m_lastLowUpdate = now;
m_low1PercentFPS = calculateLow1PercentFPS(3.0f);
}
```
### Issue 2 of 6
GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp:1120-1126
Same static-local timer issue as in the Generals copy — `lastLowUpdate` should be a member variable `m_lastLowUpdate` to be consistent with the rest of the refactoring and to support clean re-initialization.
```suggestion
UnsignedInt now = timeGetTime();
if (now - m_lastLowUpdate >= 100) // update low 1% metrics at 100ms intervals instead of 1000ms since it is now extremely cheap
{
m_lastLowUpdate = now;
m_low1PercentFPS = calculateLow1PercentFPS(3.0f);
}
```
### Issue 3 of 6
Generals/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h:179-183
The `m_averageFPS` doc comment says "0.5s" but `updatePerformanceMetrics()` calls `calculateAverageFPS(1.0f)`. Also there is a spurious extra semicolon after the enum declaration (`}; ;`).
```suggestion
Real m_averageFPS; ///< average fps over the last 1.0s.
Real m_low1PercentFPS; ///< 1% low fps.
Real m_currentFPS; ///< current fps value.
enum { FPS_HISTORY_SIZE = 4096 }; // degrades gracefully beyond this size
```
### Issue 4 of 6
GeneralsMD/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h:179-183
Same stale "0.5s" doc comment and double-semicolon typo as in the Generals header.
```suggestion
Real m_averageFPS; ///< average fps over the last 1.0s.
Real m_low1PercentFPS; ///< 1% low fps.
Real m_currentFPS; ///< current fps value.
enum { FPS_HISTORY_SIZE = 4096 }; // degrades gracefully beyond this size
```
### Issue 5 of 6
Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp:999-1001
These single-line `if` bodies violate the project rule requiring the statement to be on a separate line from the condition. The same pattern appears in `calculateLow1PercentFPS` at line 1022, and both occurrences are also present in the GeneralsMD copy.
```suggestion
if (idx < 0)
idx += FPS_HISTORY_SIZE;
timeSum += m_durationHistory[idx];
samples++;
```
### Issue 6 of 6
GeneralsMD/Code/GameEngine/Source/GameClient/InGameUI.cpp:6047-6063
**Generals vs GeneralsMD display format divergence**
The PR description states "Both Generals and GeneralsMD implementations are included in this PR with identical code," but the two are visibly different. GeneralsMD formats the 1% low as `▼N` (`\x25BC%u`) and the FPS cap as `▲N`/`▲X`, while Generals uses `(%u)` and `[N]`/`[X]`. The default `m_renderFpsLowColor` also differs — `(180, 170, 120, 255)` in Generals vs `(190, 180, 130, 255)` in GeneralsMD. Are the Unicode arrows intentional for Zero Hour only, or is one of these meant to mirror the other?
Reviews (8): Last reviewed commit: "Buffer Footprint: Changed FPS history si..." | Re-trigger Greptile
Move FPS history state into W3DDisplay members. Implement accurate time-based windowing for frame metrics. Use ceiling logic for improved 1% low accuracy. Optimize percentile calculation using efficient selection algorithm. Rename and centralize performance update call sites. Increase history buffer for stable high-FPS monitoring.
Update average FPS math to use time-weighted mean. Move sortBuffer to class members for consistency.
| Real m_low1PercentFPS; ///<1% low fps. | ||
| Real m_currentFPS; ///<current fps value. | ||
|
|
||
| enum { FPS_HISTORY_SIZE = 5000 }; // covers 5s at 1000 FPS, degrades gracefully beyond |
There was a problem hiding this comment.
I think this number needs evaluation but I would like input from others.
As it stands, 5000 samples in three Real arrays require 60 kB in memory - just for the FPS trackers.
I think the question should be: given 3 seconds timeframe for the 1% low FPS, At what 3-second average FPS is the low fps no longer relevant. Say you average 300 fps, what is the chance that the 1% is so low that it is still relevant as a performance metric. If 300 fps is the upper bound, only 900 samples need to be stored. That's only 18% of the memory needed compared to the current setting.
There was a problem hiding this comment.
Say you average 300 fps, what is the chance that the 1% is so low that it is still relevant as a performance metric.
It's pretty common to see 300 average fps and 30 fps lows in this game in large skirmish matches even on vs2022 non-retail.
But definitely would like to hear multiple inputs for this number.
There was a problem hiding this comment.
The implementation approach is fine, but the data size requirements for the fps counters are outrageous.
I inquired a bit with Chat Gippy and it would be possible to reduce size requirements by quantizing samples and moving intermediate samples into timed buckets.
Here is a sample FrameBucket with avg and min frame time, stored in 2 bytes each, with a resolution of 16 micro seconds.
// Compact frametime statistics bucket.
//
// Stores average and maximum frametime values using 16-bit integers
// with fixed-point quantization.
//
// ------------------------------------------------------------------
// Encoding
// ------------------------------------------------------------------
//
// Values are stored in units of 16 microseconds:
//
// stored_value = frametime_us / 16
//
// This allows a uint16_t to represent:
//
// 65535 * 16 us = 1,048,560 us
// = ~1048 ms
//
// which comfortably covers extremely slow frames (~1 FPS).
//
// ------------------------------------------------------------------
// Precision
// ------------------------------------------------------------------
//
// Quantization step:
//
// 16 us = 0.016 ms
//
// Maximum quantization error:
//
// +/- 8 us
//
// This is effectively negligible for FPS telemetry.
//
// Example:
//
// 16.667 ms frame (60 FPS)
//
// Encoded as:
//
// 16667 us / 16 = 1042
//
// Decoded:
//
// 1042 * 16 = 16672 us
//
// Error:
//
// +5 us = 0.005 ms
//
// ------------------------------------------------------------------
// Why fixed-point quantization?
// ------------------------------------------------------------------
//
// Advantages:
//
// - extremely compact (4 bytes total per bucket)
// - deterministic runtime cost
// - cache friendly
// - no floating-point storage
// - sufficient precision for telemetry
// - supports >1 second frametimes
//
// ------------------------------------------------------------------
// Intended usage
// ------------------------------------------------------------------
//
// Typical bucket generation:
//
// avgFrameUs = accumulatedFrameUs / frameCount
// maxFrameUs = worstFrameUsSeen
//
// Then:
//
// bucket.SetAvgUs(avgFrameUs);
// bucket.SetMaxUs(maxFrameUs);
//
// ------------------------------------------------------------------
// Recommended usage pattern
// ------------------------------------------------------------------
//
// Build buckets over fixed time windows:
//
// e.g. 50 ms or 100 ms
//
// rather than fixed frame counts.
//
// This keeps statistical resolution stable across varying FPS.
//
// ------------------------------------------------------------------
struct FrameBucket
{
uint16_t avg16us;
uint16_t max16us;
static constexpr uint32_t QUANTUM_US = 16;
static constexpr uint32_t QUANTUM_SHIFT = 4;
static constexpr uint32_t MAX_US =
0xFFFFu * QUANTUM_US;
// Encode microseconds into 16 us fixed-point units.
//
// Rounds to nearest unit.
//
// Input is clamped to representable range.
static uint16_t EncodeUs(uint32_t us)
{
if (us > MAX_US)
us = MAX_US;
return static_cast<uint16_t>(
(us + (QUANTUM_US / 2)) >> QUANTUM_SHIFT);
}
// Decode fixed-point units back into microseconds.
static uint32_t DecodeUs(uint16_t v)
{
return static_cast<uint32_t>(v)
<< QUANTUM_SHIFT;
}
void SetAvgUs(uint32_t us)
{
avg16us = EncodeUs(us);
}
void SetMaxUs(uint32_t us)
{
max16us = EncodeUs(us);
}
uint32_t GetAvgUs() const
{
return DecodeUs(avg16us);
}
uint32_t GetMaxUs() const
{
return DecodeUs(max16us);
}
float GetAvgMs() const
{
return static_cast<float>(GetAvgUs()) * 0.001f;
}
float GetMaxMs() const
{
return static_cast<float>(GetMaxUs()) * 0.001f;
}
float GetAvgFPS() const
{
const uint32_t us = GetAvgUs();
return (us > 0)
? (1000000.0f / static_cast<float>(us))
: 0.0f;
}
float GetMinFPS() const
{
const uint32_t us = GetMaxUs();
return (us > 0)
? (1000000.0f / static_cast<float>(us))
: 0.0f;
}
};And then we move these FrameBuckets into a time based array.
Sample implementation:
// ============================================================================
// Rolling 3-second FPS statistics
// ============================================================================
//
// Stores:
//
// 3 seconds @ 10 ms resolution
//
// = 300 buckets
//
// Each bucket summarizes:
//
// - average frametime
// - worst frametime
//
// over a 10 ms interval.
//
// ============================================================================
class FpsHistory
{
public:
static constexpr uint32_t BUCKET_INTERVAL_US = 10000; // 10 ms
static constexpr uint32_t HISTORY_SECONDS = 3;
static constexpr uint32_t BUCKET_COUNT =
(HISTORY_SECONDS * 1000000) / BUCKET_INTERVAL_US;
// ------------------------------------------------------------------------
void Reset()
{
m_writeIndex = 0;
m_bucketCount = 0;
m_accumulatedUs = 0;
m_accumulatedFrames = 0;
m_maxFrameUs = 0;
m_bucketElapsedUs = 0;
std::fill(
std::begin(m_buckets),
std::end(m_buckets),
FrameBucket{});
}
// ------------------------------------------------------------------------
// Add a frame
//
// frameUs:
// frametime in microseconds
//
// Example:
//
// 16.667 ms = 16667 us
//
// ------------------------------------------------------------------------
void AddFrame(uint32_t frameUs)
{
// Accumulate stats for current bucket.
m_accumulatedUs += frameUs;
m_accumulatedFrames++;
if (frameUs > m_maxFrameUs)
m_maxFrameUs = frameUs;
m_bucketElapsedUs += frameUs;
// Emit one or more buckets if enough time elapsed.
while (m_bucketElapsedUs >= BUCKET_INTERVAL_US)
{
EmitBucket();
m_bucketElapsedUs -= BUCKET_INTERVAL_US;
}
}
// ------------------------------------------------------------------------
float GetAverageFPS() const
{
if (m_bucketCount == 0)
return 0.0f;
uint64_t totalUs = 0;
for (uint32_t i = 0; i < m_bucketCount; ++i)
{
totalUs += m_buckets[i].GetAvgUs();
}
const float avgUs =
static_cast<float>(totalUs)
/ static_cast<float>(m_bucketCount);
return avgUs > 0.0f
? (1000000.0f / avgUs)
: 0.0f;
}
// ------------------------------------------------------------------------
// Approximate 1% low FPS
//
// Uses the worst bucket frametimes.
//
// This is intentionally approximate and designed for:
//
// - low memory use
// - deterministic runtime
// - good stutter detection
//
// ------------------------------------------------------------------------
float GetOnePercentLowFPS() const
{
if (m_bucketCount == 0)
return 0.0f;
uint32_t worstUs = 0;
// Find worst bucket maximum frametime.
for (uint32_t i = 0; i < m_bucketCount; ++i)
{
worstUs = std::max(
worstUs,
m_buckets[i].GetMaxUs());
}
return worstUs > 0
? (1000000.0f / static_cast<float>(worstUs))
: 0.0f;
}
private:
// ------------------------------------------------------------------------
void EmitBucket()
{
FrameBucket& bucket =
m_buckets[m_writeIndex];
// Compute average frametime for bucket.
const uint32_t avgUs =
(m_accumulatedFrames > 0)
? static_cast<uint32_t>(
m_accumulatedUs / m_accumulatedFrames)
: 0;
bucket.SetAvgUs(avgUs);
bucket.SetMaxUs(m_maxFrameUs);
// Advance ring buffer.
m_writeIndex =
(m_writeIndex + 1) % BUCKET_COUNT;
if (m_bucketCount < BUCKET_COUNT)
++m_bucketCount;
// Reset accumulators.
m_accumulatedUs = 0;
m_accumulatedFrames = 0;
m_maxFrameUs = 0;
}
private:
FrameBucket m_buckets[BUCKET_COUNT];
uint32_t m_writeIndex = 0;
uint32_t m_bucketCount = 0;
uint64_t m_accumulatedUs = 0;
uint32_t m_accumulatedFrames = 0;
uint32_t m_maxFrameUs = 0;
uint32_t m_bucketElapsedUs = 0;
};
// ============================================================================
// Example usage
// ============================================================================
int main()
{
FpsHistory history;
history.Reset();
// Simulate ~60 FPS.
for (int i = 0; i < 500; ++i)
{
uint32_t frameUs = 16667;
// Simulate occasional stutter.
if ((i % 120) == 0)
{
frameUs = 50000; // 50 ms hitch
}
history.AddFrame(frameUs);
}
printf(
"Average FPS: %.2f\n",
history.GetAverageFPS());
printf(
"Approx 1%% Low FPS: %.2f\n",
history.GetOnePercentLowFPS());
return 0;
}This data organization approach uses just 1200 bytes at most. Bucket intervals could also be longer for even less size.
It loses value accuracy. It optimizes for speed over accuracy.
However, your current implemention would perhaps be more efficient at low frame rates, because less values need reading at small frame counts. I do like that aspect, because low fps runtimes need to do less.
I suggest to look into this more how to find a good balance between size,speed and accuracy. There are many options here.
There was a problem hiding this comment.
Did some optimizations into this PR.
Switched to 4096 for now, 48kb is still very luxurious for a framerate counter, but the current implementation is very easy on the cpu.
|
I do not like the visuals of the new value. Can this look better? |
|
Maybe make the 1% low value a bit brighter. It can be difficult to read in game. |
| Real m_low1PercentFPS; ///<1% low fps. | ||
| Real m_currentFPS; ///<current fps value. | ||
|
|
||
| enum { FPS_HISTORY_SIZE = 5000 }; // covers 5s at 1000 FPS, degrades gracefully beyond |
There was a problem hiding this comment.
The implementation approach is fine, but the data size requirements for the fps counters are outrageous.
I inquired a bit with Chat Gippy and it would be possible to reduce size requirements by quantizing samples and moving intermediate samples into timed buckets.
Here is a sample FrameBucket with avg and min frame time, stored in 2 bytes each, with a resolution of 16 micro seconds.
// Compact frametime statistics bucket.
//
// Stores average and maximum frametime values using 16-bit integers
// with fixed-point quantization.
//
// ------------------------------------------------------------------
// Encoding
// ------------------------------------------------------------------
//
// Values are stored in units of 16 microseconds:
//
// stored_value = frametime_us / 16
//
// This allows a uint16_t to represent:
//
// 65535 * 16 us = 1,048,560 us
// = ~1048 ms
//
// which comfortably covers extremely slow frames (~1 FPS).
//
// ------------------------------------------------------------------
// Precision
// ------------------------------------------------------------------
//
// Quantization step:
//
// 16 us = 0.016 ms
//
// Maximum quantization error:
//
// +/- 8 us
//
// This is effectively negligible for FPS telemetry.
//
// Example:
//
// 16.667 ms frame (60 FPS)
//
// Encoded as:
//
// 16667 us / 16 = 1042
//
// Decoded:
//
// 1042 * 16 = 16672 us
//
// Error:
//
// +5 us = 0.005 ms
//
// ------------------------------------------------------------------
// Why fixed-point quantization?
// ------------------------------------------------------------------
//
// Advantages:
//
// - extremely compact (4 bytes total per bucket)
// - deterministic runtime cost
// - cache friendly
// - no floating-point storage
// - sufficient precision for telemetry
// - supports >1 second frametimes
//
// ------------------------------------------------------------------
// Intended usage
// ------------------------------------------------------------------
//
// Typical bucket generation:
//
// avgFrameUs = accumulatedFrameUs / frameCount
// maxFrameUs = worstFrameUsSeen
//
// Then:
//
// bucket.SetAvgUs(avgFrameUs);
// bucket.SetMaxUs(maxFrameUs);
//
// ------------------------------------------------------------------
// Recommended usage pattern
// ------------------------------------------------------------------
//
// Build buckets over fixed time windows:
//
// e.g. 50 ms or 100 ms
//
// rather than fixed frame counts.
//
// This keeps statistical resolution stable across varying FPS.
//
// ------------------------------------------------------------------
struct FrameBucket
{
uint16_t avg16us;
uint16_t max16us;
static constexpr uint32_t QUANTUM_US = 16;
static constexpr uint32_t QUANTUM_SHIFT = 4;
static constexpr uint32_t MAX_US =
0xFFFFu * QUANTUM_US;
// Encode microseconds into 16 us fixed-point units.
//
// Rounds to nearest unit.
//
// Input is clamped to representable range.
static uint16_t EncodeUs(uint32_t us)
{
if (us > MAX_US)
us = MAX_US;
return static_cast<uint16_t>(
(us + (QUANTUM_US / 2)) >> QUANTUM_SHIFT);
}
// Decode fixed-point units back into microseconds.
static uint32_t DecodeUs(uint16_t v)
{
return static_cast<uint32_t>(v)
<< QUANTUM_SHIFT;
}
void SetAvgUs(uint32_t us)
{
avg16us = EncodeUs(us);
}
void SetMaxUs(uint32_t us)
{
max16us = EncodeUs(us);
}
uint32_t GetAvgUs() const
{
return DecodeUs(avg16us);
}
uint32_t GetMaxUs() const
{
return DecodeUs(max16us);
}
float GetAvgMs() const
{
return static_cast<float>(GetAvgUs()) * 0.001f;
}
float GetMaxMs() const
{
return static_cast<float>(GetMaxUs()) * 0.001f;
}
float GetAvgFPS() const
{
const uint32_t us = GetAvgUs();
return (us > 0)
? (1000000.0f / static_cast<float>(us))
: 0.0f;
}
float GetMinFPS() const
{
const uint32_t us = GetMaxUs();
return (us > 0)
? (1000000.0f / static_cast<float>(us))
: 0.0f;
}
};And then we move these FrameBuckets into a time based array.
Sample implementation:
// ============================================================================
// Rolling 3-second FPS statistics
// ============================================================================
//
// Stores:
//
// 3 seconds @ 10 ms resolution
//
// = 300 buckets
//
// Each bucket summarizes:
//
// - average frametime
// - worst frametime
//
// over a 10 ms interval.
//
// ============================================================================
class FpsHistory
{
public:
static constexpr uint32_t BUCKET_INTERVAL_US = 10000; // 10 ms
static constexpr uint32_t HISTORY_SECONDS = 3;
static constexpr uint32_t BUCKET_COUNT =
(HISTORY_SECONDS * 1000000) / BUCKET_INTERVAL_US;
// ------------------------------------------------------------------------
void Reset()
{
m_writeIndex = 0;
m_bucketCount = 0;
m_accumulatedUs = 0;
m_accumulatedFrames = 0;
m_maxFrameUs = 0;
m_bucketElapsedUs = 0;
std::fill(
std::begin(m_buckets),
std::end(m_buckets),
FrameBucket{});
}
// ------------------------------------------------------------------------
// Add a frame
//
// frameUs:
// frametime in microseconds
//
// Example:
//
// 16.667 ms = 16667 us
//
// ------------------------------------------------------------------------
void AddFrame(uint32_t frameUs)
{
// Accumulate stats for current bucket.
m_accumulatedUs += frameUs;
m_accumulatedFrames++;
if (frameUs > m_maxFrameUs)
m_maxFrameUs = frameUs;
m_bucketElapsedUs += frameUs;
// Emit one or more buckets if enough time elapsed.
while (m_bucketElapsedUs >= BUCKET_INTERVAL_US)
{
EmitBucket();
m_bucketElapsedUs -= BUCKET_INTERVAL_US;
}
}
// ------------------------------------------------------------------------
float GetAverageFPS() const
{
if (m_bucketCount == 0)
return 0.0f;
uint64_t totalUs = 0;
for (uint32_t i = 0; i < m_bucketCount; ++i)
{
totalUs += m_buckets[i].GetAvgUs();
}
const float avgUs =
static_cast<float>(totalUs)
/ static_cast<float>(m_bucketCount);
return avgUs > 0.0f
? (1000000.0f / avgUs)
: 0.0f;
}
// ------------------------------------------------------------------------
// Approximate 1% low FPS
//
// Uses the worst bucket frametimes.
//
// This is intentionally approximate and designed for:
//
// - low memory use
// - deterministic runtime
// - good stutter detection
//
// ------------------------------------------------------------------------
float GetOnePercentLowFPS() const
{
if (m_bucketCount == 0)
return 0.0f;
uint32_t worstUs = 0;
// Find worst bucket maximum frametime.
for (uint32_t i = 0; i < m_bucketCount; ++i)
{
worstUs = std::max(
worstUs,
m_buckets[i].GetMaxUs());
}
return worstUs > 0
? (1000000.0f / static_cast<float>(worstUs))
: 0.0f;
}
private:
// ------------------------------------------------------------------------
void EmitBucket()
{
FrameBucket& bucket =
m_buckets[m_writeIndex];
// Compute average frametime for bucket.
const uint32_t avgUs =
(m_accumulatedFrames > 0)
? static_cast<uint32_t>(
m_accumulatedUs / m_accumulatedFrames)
: 0;
bucket.SetAvgUs(avgUs);
bucket.SetMaxUs(m_maxFrameUs);
// Advance ring buffer.
m_writeIndex =
(m_writeIndex + 1) % BUCKET_COUNT;
if (m_bucketCount < BUCKET_COUNT)
++m_bucketCount;
// Reset accumulators.
m_accumulatedUs = 0;
m_accumulatedFrames = 0;
m_maxFrameUs = 0;
}
private:
FrameBucket m_buckets[BUCKET_COUNT];
uint32_t m_writeIndex = 0;
uint32_t m_bucketCount = 0;
uint64_t m_accumulatedUs = 0;
uint32_t m_accumulatedFrames = 0;
uint32_t m_maxFrameUs = 0;
uint32_t m_bucketElapsedUs = 0;
};
// ============================================================================
// Example usage
// ============================================================================
int main()
{
FpsHistory history;
history.Reset();
// Simulate ~60 FPS.
for (int i = 0; i < 500; ++i)
{
uint32_t frameUs = 16667;
// Simulate occasional stutter.
if ((i % 120) == 0)
{
frameUs = 50000; // 50 ms hitch
}
history.AddFrame(frameUs);
}
printf(
"Average FPS: %.2f\n",
history.GetAverageFPS());
printf(
"Approx 1%% Low FPS: %.2f\n",
history.GetOnePercentLowFPS());
return 0;
}This data organization approach uses just 1200 bytes at most. Bucket intervals could also be longer for even less size.
It loses value accuracy. It optimizes for speed over accuracy.
However, your current implemention would perhaps be more efficient at low frame rates, because less values need reading at small frame counts. I do like that aspect, because low fps runtimes need to do less.
I suggest to look into this more how to find a good balance between size,speed and accuracy. There are many options here.
…ootprint. Fast Indexing: Replaced slow modulo division with fast bitwise index masking. Loop Optimization: Eliminated per-iteration modulo checks from history search loops. Safe Sampling: Clamped minimum frame time to prevent division by zero errors. Removed Branching: Pre-primed timer in display init to remove redundant per-frame branches. Cleaned Capping: Used RenderFpsPreset constant instead of magic zero value for uncapped. Improved Windows: Widened average window and boosted low-percent telemetry update rates.
| void W3DDisplay::addFpsSample(Real elapsedSeconds) | ||
| { | ||
| if (elapsedSeconds <= 0.0f) | ||
| if (elapsedSeconds < 0.0001f) |
There was a problem hiding this comment.
Why is this clamp necessary? In what event is the frame 0 seconds long?
| for (Int i = 0; i < m_historyCount; ++i) | ||
| { | ||
| Int idx = (m_historyOffset - 1 - i + FPS_HISTORY_SIZE) % FPS_HISTORY_SIZE; | ||
| if (idx < 0) idx += FPS_HISTORY_SIZE; |
| static UnsignedInt lastLowUpdate = 0; | ||
| UnsignedInt now = timeGetTime(); | ||
| if (now - lastLowUpdate >= 1000) | ||
| if (now - lastLowUpdate >= 100) // update low 1% metrics at 100ms intervals instead of 1000ms since it is now extremely cheap |
There was a problem hiding this comment.
What does "is now extremely cheap" mean? If it was extremely cheap, it would be updated every frame, but it is updated 10 times a second.
| for (Int i = 0; i < m_historyCount; ++i) | ||
| { | ||
| Int idx = (m_historyOffset - 1 - i + FPS_HISTORY_SIZE) % FPS_HISTORY_SIZE; | ||
| if (idx < 0) idx += FPS_HISTORY_SIZE; |
There was a problem hiding this comment.
Is this branch much better than the arithmetic from before?
| if (now - lastLowUpdate >= 100) // update low 1% metrics at 100ms intervals instead of 1000ms since it is now extremely cheap | ||
| { | ||
| lastLowUpdate = now; | ||
| m_low1PercentFPS = calculateLow1PercentFPS(3.0f); |
There was a problem hiding this comment.
How about instead of updating every 100 ms, lazily update it when getLow1PercentFPS is called. Then it will be updated just once a second tops exactly when needed.
| m_historyCount = 1; | ||
| m_lastUpdateTime64 = 0; | ||
| m_currentFPS = 30.0f; | ||
| for (Int h = 0; h < FPS_HISTORY_SIZE; ++h) |
There was a problem hiding this comment.
Can also use std::fill one after the other.
| Real m_currentFPS; ///<current fps value. | ||
|
|
||
| enum { FPS_HISTORY_SIZE = 5000 }; // covers 5s at 1000 FPS, degrades gracefully beyond | ||
| enum { FPS_HISTORY_SIZE = 4096 }; ; // degrades gracefully beyond this size |
| Int idx = (m_historyOffset - 1 - i + FPS_HISTORY_SIZE) % FPS_HISTORY_SIZE; | ||
| if (idx < 0) idx += FPS_HISTORY_SIZE; | ||
| timeSum += m_durationHistory[idx]; | ||
| m_sortBuffer[sampleCount++] = m_fpsHistory[idx]; |
There was a problem hiding this comment.
Maybe the sort buffer can be on the stack instead? Should be a lot cheaper.
|
|
||
| m_currentFPS = 1.0f / elapsedSeconds; | ||
| m_fpsHistory[m_historyOffset] = m_currentFPS; | ||
| m_durationHistory[m_historyOffset] = elapsedSeconds; |
There was a problem hiding this comment.
Why do we store both, fps and frametime, when they can be inferred from each other?
| enum { FPS_HISTORY_SIZE = 5000 }; // covers 5s at 1000 FPS, degrades gracefully beyond | ||
| enum { FPS_HISTORY_SIZE = 4096 }; ; // degrades gracefully beyond this size | ||
| Real m_fpsHistory[FPS_HISTORY_SIZE]; | ||
| Real m_durationHistory[FPS_HISTORY_SIZE]; |
There was a problem hiding this comment.
Try store 2 byte unsigned values for quantized microseconds (use 16 microseconds precision).
Together with the other suggestions, the member data size would then shrink from 49 kb to 8 kb.
Quick-bench shows quantized integer will be 6 to 7 times faster than the float version (gcc, clang, O3).
https://quick-bench.com/q/CnVv3pcyDt3r1ikoGCELHiWgzqM
#include <array>
#include <cstdint>
#include <random>
static void FloatArrayAccumulate(benchmark::State& state) {
std::array<float, 4096> values;
std::mt19937 rng(12345);
std::uniform_real_distribution<float> dist(0.0f, 1.0f);
// Fill array with random floats
for (float& v : values) {
v = dist(rng);
}
for (auto _ : state) {
// Accumulate values
float sum = 0.0f;
for (float v : values) {
sum += v;
}
benchmark::DoNotOptimize(sum);
}
}
BENCHMARK(FloatArrayAccumulate);
static void UShortArrayAccumulateMul16(benchmark::State& state) {
std::array<unsigned short, 4096> values;
std::mt19937 rng(12345);
std::uniform_int_distribution<unsigned short> dist(0, 65535);
// Fill array with random unsigned shorts
for (unsigned short& v : values) {
v = dist(rng);
}
static constexpr uint32_t QUANTUM_SHIFT = 4; // represents 16 microseconds
for (auto _ : state) {
// Accumulate values shifted by 4
uint64_t sum = 0;
for (unsigned short v : values) {
sum += static_cast<uint64_t>(v) << QUANTUM_SHIFT;
}
benchmark::DoNotOptimize(sum);
}
}
BENCHMARK(UShortArrayAccumulateMul16);
This PR adds a 1% low FPS metric to the existing FPS counter HUD, displayed in parentheses next to the average FPS. The 1% low is a standard performance metric used to surface frame time spikes that the average FPS hides. Inspired by #1942.
Added 1% low FPS display to HUD counter
Added m_renderFpsLowString and supporting UI members
Added RenderFpsLowColor configuration to InGameUI INI
Increased history to 5,000 time-bounded frames
Implemented rolling 0.5s window for average FPS
Implemented rolling 3.0s window for 1% lows
The following screenshot from AOD Cobalt Rush shows the 1% low FPS overlay compared to CapFrameX (an external benchmarking tool, centered right), demonstrating the value of surfacing this metric separately from the average.
This change was generated with AI assistance. All generated code has been reviewed, tested, and verified for correctness. The implementation went through multiple iterations, including fundamental changes to the underlying approach, as well as passes to apply simplifications, fix inconsistencies, and optimize performance. Both Generals and GeneralsMD implementations are included in this PR with identical code.